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1. INTRODUCTION 

Sign language (SL) is the primary language and can be considered as the mother tongue for the HSI 
people. Many people who are born deaf learn sign language as their primary language, and it remains their 
preferred, or first, language. There is no written form of sign language, so deaf people communicate using 
reading and writing in their second or less preferred language. Therefore, a significant proportion of deaf 
people have a strong preference for accessing information in sign language rather than written text. Sign 
language only needs some important words compared to spoken language [1], [2]. Generally, sign language 
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uses subject, verb, noun, and adverb. There are no other suffixes, prefixes, and particles. It is a non-verbal 
language that uses hand movement, hand orientation, face expression, head movement, posture, and body 
orientation [3]. Since sign language is a non-verbal language, the understanding of sign language has been 
compulsory for HSI people to communicate. 

The awareness of sign language for non-HSI people is little, or many do not know sign language. 
Therefore, it provides an obstacle in communication in the community, especially if it needs interaction 
between non-HSI and HSI people. As obstacles arise in contact with the community, the communication 
bridge must fill the gap between them. The options are sign language translator and sign language synthesizer 
technology, translating spoken language to sign language [4]. 

Using a sign language translator to communicate between non-HSI and HSI has been limited since 
sign language translators are limited in Malaysia. As early as 2017, there are only less than 100 certified SL 
translators to cater to more than 30,000 persons of HSI (references). While in the world, the world federation 
of the deaf reported that there are about 70 million HSI people [4] and 138 living sign language, which is 
according to the ethnologue catalog [5]. 

Sign language synthesizer consists of three main modules, i.e., the voice recognition module, 
language processing unit module, and signing module. Each module has its components and algorithms 
which need a different approach to development. In this paper, the main focus is on the language processing 
module, which transforms the input language. The language processing module alters input language into 
output language that is suitable for output sign language. The input and output language are in the sequence 
of words (text), in which some methodology is required to do the transformation process properly. 

The development of language processing units has been made and implemented in many different 
sign languages, for example, American sign language [6], British sign language [7], South African sign 
language [8], and Australian sign language [9]. However, in Malaysia, the language processing unit has not 
been implemented as an integral part of the sign language synthesizer. Furthermore, the language processing 
unit for bahasa isyarat Malaysia has not been implemented. A comprehensive review of the existing work 
and proposed work on the language processing unit is presented in this paper. In addition, various methods 
such as edit distance, natural language processing, HMM methods, and Bayesian network are discussed. 


2. LITERATURE REVIEW 

This section reviews the techniques of the language processing unit. The technique for the language 
processing unit provides a literature background for the language processing unit using natural language 
processing. 


2.1. Natural language processing (NLP) 
2.1.1. NLP basic processing 

The necessary process of Natural language processing (NLP) that can be used for SL synthesizer is 
the most straightforward technique which has been implemented [10]. This technique only involves three 
basic operations, i.e., POS (part of speech) tagger, optimizer, and stemming. Figure 1 shows the step of this 
technique. 


Voice Animated 
Input POs Tagger | Animated 
Avatar 


Figure 1. NLP basic processing [4] 





The initial stage is the POS tagger, which involves morphological analysis. Then, as the POS 
tagger's output, the optimizer takes part in the following step to remove the unnecessary words. Finally, 
before the output is given to the animation step [11], the stemming process is involved in finding the words' 
primary form. 


2.1.2. NLP with gloss-based approach 

The gloss-based approach is a method that associates the words and their meanings through a 
dictionary [12]. The order of the language grammar defines the order of the glosses. In a report by Almeida 
et al. [13], the order of blocks (glosses) is calculated according to Portugal sign language (LGP) grammar. As 
the final step, the order of blocks was converted into the sign language order. Figure 2 shows the gloss-based 
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approach technique by Almeida et al. [13]. The text was associated with the dictionary (database) to be 
processed in a natural language processing block. In addition, the output was translated to a sequence of 
glosses and actions. 

A stemmer can be used to identify the stem and relevant suffixes (and prefixes), which allows 
inferring, for instance, the gender and the number of given words. A part-of-speech (POS) tagger can also 
contribute to the translation process, which couples with the stemmer in the identification of the different 
types of affixes. In addition, a POS tagger usually feeds further processing, for instance, named entity 
recognizers and syntactic analyzers. A named entity recognizer allows identifying persons' names and a 
syntactic analyzer to determine the sentence's syntactic components, such as subject and object. 







Database Natural Language Processing 


(Gloss Translate) 


sequence of glosses 
Animate 
sequence of actions 


Figure 2. NLP with gloss based technique [13] 


2.1.3. NLP with rule-based and statistical translation 

The rule-based translation is a strategy that analyses the word's input until a group of words 
(sentence) [14]. The translation analysis finds specific combinations of words or signs (blocks) that generate 
a sign. The finding process starts from each word individually and extends the analysis to neighborhood 
context words or already-formed signs. There are two steps involved in the translation process. In the first 
one, every word is mapped to one or several syntactic pragmatic tags. The translation module then applies 
different rules that convert the tagged words into signs through grouping concepts or signs (blocks) and 
defining new signs. These rules can define short and extensive scope relationships between the concepts or 
signs. At the end of the process, the block sequence is expected to correspond to the sign sequence resulting 
from the translation process. The rule-based translation module provides the translation rules for the 
translation process. San-Segundo et al. [15] provided the evaluation tools for performance measures. Three 
available measurement tools have been considered: sign error rate (SER), position independent rate (PER), 
and bilingual evaluation understudy (BLEU). 

The statistical translation method calculates the probability between the word sequence and sign 
sequence stored in a database as the reference [16]. One of the methods in statistical translation is phrase- 
based translation [15], [17]. Figure 3 shows the diagram of the phrase-based translation module used by San- 
Segundo et al. [15]. 


Source 






eo ee Language 
Parallel Word Phrase Extraction Translation 
Corpora Alignment and Scoring Model 









Word 
Alignment 


Translation 
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Language aa Target 
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Figure 3. Diagram of phrase-based translation module [18] 
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The translation process uses a translation model based on phrases and a target language model as 
reported in San-Segundo et al. [19], [20], the GIZA++ software has been used to calculate the alignments 
between words and signs. San-Segundo et al. [15] reported that the statistical translation shows the worst 
outcome from the rule-based strategy. This condition is due to its restricted domain, and it has been possible 
to develop a complete set of rules with a reasonable effort. 


2.2. Edit distance techniques 
2.2.1. Levenshtein distance 

Levenshtein distance (LD) is a technique for looking for the differences between two different 
strings and computing the two different phonetic strings' distance. The basic technique of LD involves three 
main processes, which are insertion, deletion, and substitutions. The LD is a method of aligning two phonetic 
segments. The enhancement was implemented in prior research, allowing only alignments of consonants with 
consonants and vowels with vowels [21]. 

The LD is a popular string metric used to evaluate strings on orthographic similarity in information 
theory. LD counts minimal substitutions, insertions, and deletions to edit one string into another of any length 
[22]. For word pairs with equal word length, LD produces only distances smaller or equal to the Hamming 
distance [23]. The Hamming distance counts the minimal number of substitutions needed to edit one string 
into another equal length [24]. 

The implementation of LD provides the distance calculation of two varieties of words [25]. The LD 
was used in the analysis of linguistic variations in many other languages, for example, German [26], Dutch 
[27], Frisian [28], and Bulgarian [29]. The other successful implementation was tested on 15 Norwegian 
dialects perceptually and acoustically [30]. 


2.2.2. Mahalanobis distance 

Mahalanobis distance is the distance between two samples based on their mean feature vectors 
Haand up, and the covariance matrix X of the features across all samples in a database. The Mahalanobis 
distance is given as (3) [31]. 


Dyu (Ha, Hp) E (Ha — up)” È (ua — Hp) (3) 


The mahalanobis distance metric is scaled according to the precision matrix (the covariance matrix's 
inverse) [32]. It provides a way of reducing the influence of distances along dimensions irrelevant to the 
current descriptive word and normalizing distances across different feature spaces to create a single distance 
value for object classification. The mahalanobis distance metric can be seen as a feature weighting within 
dimensions of features and exclusive features. For example, the lightness dimension of color space varies 
more than the color dimensions for a given color word. Therefore, distance in the lightness dimension has a 
reduced effect on classification. Scaling features in this manner also allow us to combine disjoint features of 
varying dimensions and distributions, allowing greater flexibility for future features [33]. 

Mahalanobis distance is essentially a distance measure based on correlations between variables by 
which different patterns can be identified and analyzed. It is a useful way of determining the similarity of an 
unknown sample set to a known one. Distance-based approaches calculate the distance from a point to a 
particular point in the data set. Distance to the mean, the average distance between the query point and all 
points in the data set, the maximum distance between the query point and data set points are examples of the 
many options. Whether a data point is close to the data set depends on the user's threshold [34]. 

Mahalanobis distance is a distance between two points x = (x1, X2, ..-, Xp)® and y = (Y1, Yz» --, Yp)* 
in the p dimensional space R, is defined as (4) [35]. 


d(x,y) = 4 (x — y)tS-t (x — y) (4) 


Where d,(x,0) = ||x]|, = vxtS71x is the norm of x and S~* is a positive semi-definite covariance matric. 
Points with the same distance of the origin ||x||, = c satisfy x'S5~'x = c? which is the general equations of 
an ellipsoid centered at the origin, and we are interested in the distance of an observation from its center x 
given by (5). 


d(x, x) = y (x — X)'S~*(x — x) (5) 


The mahalanobis distance's drawback is the equal adding up of the variance normalized squared 
distances of the features. In the case of noise-free signals, this leads to the best possible performance. But 
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suppose the feature is distorted by noise due to the squaring of the distances. In that case, a single feature can 
have such a high value that it covers the other features' information and leads to a misclassification [35]. 

Therefore, to find classification procedures more robust to noise, we have to find a distance measure 
that gives less weight to the noisy features and more weight to the clean features. It is reached by comparing 
the different input features to decide which feature should be given less weight or excluded and have more 
weight [36], [37]. 


2.2.3. Soundex distance 

Phonetic encoding techniques consider a word phonetic transcription for classification and coding 
purposes, such as correcting eventual spelling mistakes and classifying phonetically digital libraries, 
dictionaries, and databases [38]. The phonetic representation has several applications. First, it allows to 
search concepts based on pronunciation rather than spelling [38]-[40]. 

The soundex phonetic technique was mainly used in applications involving searching people's 
names like air reservation systems, censuses, and other tasks presenting typing errors due to phonetic 
similarity [41]. Schütze et al. [42] reported that the soundex technique evaluates each letter in the input word 
and assigns a numeric value that converts each word into a code made up of four elements [43]. Thus, 
soundex uses numeric codes for each letter of the string to be codified, as shown in Table 1. 


Table 1. Soundex phonetic codes for English alphabet 


Numeric Code Letter 
0 a, 1, U, €, O, y 
1 b, p, f, v 
2 Cc, g, J, k, g, 8, Z 
3 d, t 
4 L 
5 m, n 
6 r 


2.2.4. N-grams distance 

An N-gram is a sub-sequence of N items from a given sequence. N-grams are used in various areas 
of statistical natural language processing and genetic sequence analysis. The items in question can be 
characters, words, or base pairs according to the application. For example, this N-gram output can be used for 
statistical machine translation and spell checking [44]. 

Pattern extraction is the process of parsing a sequence of items to find or extract a particular pattern 
of items. Pattern length can be fixed, as in the n-gram model, or it can be variable. Variable-length patterns 
can be directives to specific rules, like regular expressions. However, they can also be random and depend on 
the context and pattern repetition in the patterns dictionary [45]-[47]. 


3. PROPOSED ADAPTIVE LANGUAGE PROCESSING UNIT 

The proposed approach, shown in Figure 4, involves text classifiers, where they classify text input to 
its corresponding word tagging. The system implements NLP, HMM, and Bayesian as an adaptive 
combination module. The HMM and Bayesian Network are implemented together to cover the various 
lengths of the input text, in which the Bayesian Network handles longer sentences, while HMM handles 
shorter sentences. Such an adaptive selection of classifiers in the proposed system allows for longer sentences 
to be performed accurately. 

The proposed system language processing unit contains word identification and tagging and an 
adaptive classifier, automatically selecting the classifier, either HMM or Bayesian Network, based on the 
number of words detected. The proposed system is evaluated based on its success rate and processing time. 
The success rate indicates whether the output from the language processing unit (LPU) contains all the 
important words with the correct structure (subject-predicate-object) or not. If all-important words detected 
from the speech are included in the output, then the system is considered a success. Otherwise, it is 
considered a failure. The processing time is defined as the LPU's measured time, starting from inputting data 
until obtaining the output from the LPU. 
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Language Processing Unit 


Speech : Natural Sentence 
Recognition N-grams Language Length l 
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Figure 4. Block diagram of the proposed adaptive language processing unit 


3.1. N-grams 

In computational linguistics, a sequence of n contiguous words is called N-gram [46-48]. The 
system implements a combination of n-grams and NLP. The proposed system, LPU, is based on n-grams and 
NLP. Figure 5 shows that the output from N-grams is fed to the LPU. Let the input to the LPU be on the 
group of words (n = 2). NLP subsequently processes the processed text input to get the proper sentences 
represented by a particular sign language's signs. In an N-gram, sentences are truncated to the length (n — 1) 
and its truncation probability is defined as (6). 


p(w; |W, ..., Wi-a) = POW IWi-ns1 = Wi-1) (6) 


Since n = 2, it has been called a bigram. In such case, the N-gram conditional probabilities 
p(w,|w;_1) can be estimated from raw text C(w,,_,W,,) based on the relative frequency of word sequences 
C(wn-1) as (7) and (8). 


C(Wn- ) 
P(Wy|Wn-1) = ee (7) 
= C(w"7! w 
P(walwi ha) = Trta (8) 


3.2. Natural language processing 

The LPU performs the text-translated process according to Sign Language's grammatical rules, in 
this case, the Malaysian sign language (MSL). First, the proposed method identifies the "important" words 
that the proposed SL synthesizer synthesizes. Then, it utilizes the "tagging" process, which labels the input 
word into specific structure categories, 1.e., subject, predicate, and object (S-P-O). 

NLP is the essential process in the LPU. Figure 5 shows the detail of the NLP. It involves a 
tokenizer, POS tagger, named entity extraction, stemmer, and lexical transfer. The steps are required to 
enhance the translation and the output of the system. 


To Sentence 


j Lexical 
Tokenizer POS Tagger Mamed rauty Stemmer Length 
Extraction Transfer 
Calculator 












Text Input 


Figure 5. Word processing of the natural language processing 
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3.2.1. Tokenizer and POS tagger 

Tokenising is a basic operation of NLP that is applied to an input text. It breaks up a stream of 
characters into words, punctuation marks, numbers, and other discrete items. Figure 6 shows the flowchart of 
the tokenizer. This NLP stage does not classify grammatical categories of the input text. It also does not 
consider any information on the syntactic structure of the text or the type of words in it (e.g., whether the 
words are verbs and nouns). The input of the tokenizer is the identified words (text input). In comparison, the 
output is the corresponding tokenized words. The last word is also tagged to indicate that it is the last token 
for the current input, and usually, the last word is a noun. 


Text awit Tokenizing Labelling 
oa Each Word Token 


Figure 6. Flowchart of tokenizer 


The POS tagger allows classifying the words into nine traditional word classes, 1.e., noun, verb, 
adjective, adverb, preposition, article, interjection, pronoun, and conjunction. In addition, each word entering 
the POS tagging is labeled according to its status in a sentence. Thus, it contributes to the translation process 
downstream [49], [50]. 

Each word entering the POS Tagging block is labeled according to its status in a sentence. Figure 7 
shows the tagging process where each word is tagged according to its particular label. Each token is tagged 
with its corresponding SPO category (1.e., as a subject, predicate, or object). The prepositions and other non- 
important words are not tagged and thus are discarded. For the same example as before, the tagged token as 
token 1 as "S" (subject), token 2 as "P" (predicate), token 3 as "O" (object), token 4 as "unknown," and token 
5 as "O" (object). 





Receive Tagging Token Yes 
Input Token Input 





Figure 7. Flowchart of POS tagger 


3.2.2. Named entity extraction 

Named entity extraction identifies types of tagged words, such as names of persons, verbs, and 
removes words with unknown tagged tokens, as shown in Figure 8. The syntactic analysis allows the 
identification of the SPO syntactic components of the sentence. For example, if the tokenized word is a noun, 
the syntactic analysis validates the appropriateness of the tag given to the tokenized words. 


Tagged Token Syntactic 
Input Analyzer 










Verify 
Tagged 
Token? 


Removal Token 
= “Unknown” 


Figure 8. Flowchart of the named entity extraction and syntactic analyzer 
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For the same example, the processed token by NEE (after POS tagger) is resulted as token 1 as "S" 
(subject), token 2 as "P" (predicate), token 3 as "O" (object), and token 5 as "O" (object). Then, after passing 
the Syntactic Analyzer process, the resulted token is token 1 as "O" (object), token 2 as "P" (predicate), token 
3 as "S" (subject), and token 5 as "S" (subject). 


3.2.3. Stemmer and lexical transfer 

The stemmer is used to identify basic forms (stems) of words allowing to infer gender information 
and the number of input words. It is aimed to map a speech (text) in a particular language to a corresponding 
sign in the target SL. It uses a language dictionary to perform an accurate reduction to root words. Figure 9 
shows the flowchart for the stemming process. Stemming uses ordinary pattern matching to strip suffixes of 
tokens simply (e.g., remove "-s," and remove "-ing,", in the word endings), thus "stripping off" typical 
grammar. In the proposed system, the stemmer is used to identify verbs only, tagged as Predicate. This stage 
is less used in some languages (including Bahasa Melayu), where no tense-dependent changes of root words 
are needed. However, affixes may be used to give extra emphasis to the meaning of the root words. Also, 
they might be applied to derive new words (usually-verbs) that have different meanings though still relate to 
the root ones. For example, token 2 "dimakan" is tagged as "P." The tagging "P" refers to the verb, where it 
removes "di" as a prefix for the word "makan." token 2 changes into "makan" and with the same tag "P" by 
having this process. 

The lexical transfer involves one-to-one mapping of the input sentences to their corresponding 
meaning. It requires referring to the dictionary and word database, see in Figure 10. Specifically, the stage 
allows distinguishing words having multiple meanings. If the words have two or more meanings, the 


meaning based on the SPO tagging information of the sentence is selected. 







Token as 
Verb? 






















Check Meaning 
of The Words 


Database 





Figure 10. Flowchart of the lexical transfer 


3.3. Sentence length calculator and adaptive selection of the classifier 

The sentence length calculator (SLC) is a string processing that calculates the number of words in a 
sentence. In this research, any input given to the systems is limited to be one complete sentence, and the word 
count is the number of words in the input sentence. Figure 11 shows the flowchart for the SLC. In addition, 
the proposed system offers an adaptive selection of the classifier, allowing for switching between HMM and 
Bayesian networks automatically. 

The HMM and Bayesian Network identify the sentence and process it such that it follows the pre- 
assigned order, 1.e., subject, predicate, and object (SPO), based on the number of words in the sentence. 
Based on the experimental result, HMM works for short sentences (threshold = 7 words), and Bayesian 
networks applies for longer sentences (more than 7 words). Once the resulting sentence has been arranged 
into the SPO order, the sentence is fed into the animation part, which allows the animated avatar. 
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Figure 11. Flowchart of the sentence length calculator 








HMM states and transition matrix are designed to organize results into Subject, Predicate, and 
Object patterns. HMM involves three states and a single output. It has three observation probability 
distributions B and three state transition probabilities A. for each state, B emits a single output V. The 
mathematical model is described as follows: 


a) à = {A, B, mn} 
b) A= {a11 Q12, A22, A23, A33} 
c) B = {subject, predicate, object} 


The initial condition z is defined as the basic words, which, in our case, has three words, 1.e., subject 
(S), predicate (P), and object (O). "subject" is defined as a person or something which does the action. 
"Predicate" is classified as an active verb that indicates the activity which is done by the subject. Finally, 
"object" is classified as a person or object, which is the subject's goal. The SPO structure is proposed to ease 
the synthesizer process, where the complete sentence should consist of these three basic elements SPO. 

The Bayesian network is applied to a condition where the number of words is more than 7. Then, the 
joint probability distribution represents the implementation of the Bayesian network. In this case, the 
probabilistic distribution is the relationship among subject (S), predicate (P), and object (O). The joint 
probability distribution can be written as (9). 


p(S,P,0) = p(S)p(P/S)p(O|S, P) (9) 


The CPD is calculated from the joint probability, and the Bayesian network consists of class variables and 
feature variables that are readily applicable to the classification task. The S (subject) is selected as the class 
variable, and the calculation for the probability of S = T given any observed value set (P, O) as (10). 


p(S=T,P,O) 


PO lO) eae one) 


(10) 


Where p(S = T|P,O) and p(S = F,P,0O) can be computed efficiently using (9). Similarly, It can be applied 
to calculate p(S = F,P,O). Then the value of S is determined by computing p(S = F,P,O) and p(S = 
T,P,O). Once the resulting sentence is arranged into the SPO order, it is fed into the animation avatar (or 
robotic manipulator), outputting it in the chosen SL lexis. 


4. EXPERIMENTAL RESULTS AND DISCUSSION 
4.1. Experimental setup 

The data used is text-based input, which is considered a simple Malay sentence. However, 
combining the number of words in one sentence, from three to ten, indicates input sentences from simple 
sentences into more complex sentences. Table 2 shows 13 sample data selected from the 130 data used in this 
experiment. The simple structure consists of only SPO structure, whereas the more complex sentences have 
SPO structure and random order of sentence structures. The latter is introduced to evaluate the proposed SL 
synthesizer technique based on the processing time and success rate. 

The proposed system has been developed using Matlab 2018a and running at Intel 15 5200U 
processor and 4 GB RAM to process the output from speech data. To evaluate the performance of the system, 
we performed four types of comparisons. The first comparison involves evaluating four different edit 
distance methods, namely Levenshtein, soundex, N-grams, and mahalanobis. The second comparison 
consists of the selection of the number of parameters used for the selected edit distance. The next comparison 
is made to evaluate the NLP and HMM's performance with NLP and Bayesian Network. Then, we can 
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determine the threshold number of words in the input sentences for the adaptive classification. Finally, the 
last evaluation is on the edit distances with the adaptive system of the classifier. 


Table 2. Sample input selected for language processing unit 


Number 
No Selected Input of Words 
1 Saya Makan Nasi (I eat rice) 3 
2 Nasi Saya Makan (Rice I eat) 3 
3 Dia Makan Nasi (He/she eats rice) 3 
4 Dia Pergi ke Pasar (He/she goes to Market) 4 
5 Ke Pasar Saya Pergi (To market I go) 4 
6 Dia Makan Nasi di Pasar (He/she eats rice at market) 5 
7 Saya dan Dia Makan Nasi (I and He eat rice) 5 
8 Nasi Saya dan Dia Makan (Rice I and he/she eat) 5 
9 Saya Dia Pergi Makan Pasar Nasi (I, he go eat market rice) 6 
10 Saya Makan Nasi dan Pergi ke Pasar (I eat rice and go to market) 7 
11 Saya Makan Nasi Dan Membeli Buah Di Pasar (J Eat Rice and Buy a Fruit at The Market) 8 
12 Saya dan Dia Makan Nasi dan Buah Di Pasar (J and He/She Eat Rice and Fruit in The Market) 9 
13 Saya Makan Nasi dan Pergi ke Pasar di Pagi Hari (I Eat Rice and Go to the Market in The Morning) 10 


4.2. Selection of edit distance and its characteristics 

Each edit distance is used in the proposed system, and the performance of each is compared based 
on its success rate and processing time. The selection is performed to determine the most effective edit 
distance and its characteristics value. The edit distance, which provides the highest success rate and the 
shortest processing time, is selected. The evaluation is carried out by comparing the processing time of each 
edit distance. It is necessary for getting a fast response for the whole system for the SL synthesizer. The 
comparison is performed by varying the number of words in a sentence. The comparison aims to find the 
significant edit distance to support the SL synthesizer's proposed language processing unit. In this research, 
determining the most suitable edit distance technique is key for the proposed system to provide the fastest 
processing time and the highest success rate. 

Figure 12 shows the performance of each edit distance with the various number of words in a 
sentence. It indicates that Levenshtein, soundex, and N-grams provided similar processing times. The 
similarity of processing time resulting from three edit distances is expected due to those similar 
characteristics. As n-grams have provided the fastest processing time than mahalanobis and soundex 
distance, N-grams have been selected to be implemented in the proposed system's language processing unit. 

Figure 13 shows the time comparison for Levenshtein, soundex, and n-grams. Levenshtein and 
soundex distance resulted in a longer processing time than n-grams distance. It is demonstrated that 
Levenstein, soundex, and mahalanobis have a longer processing time than n-grams. N-grams, at n = 2, 
provide efficient processing time compared to other edit distances. Therefore, N-grams are implemented for 
the language processing unit. 
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Figure 12. Time comparison for four edit distance in various number of words 
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Figure 13. Time comparison for Levenshtein, soundex, and N-grams edit distance 


N-grams distance is set by the number n, which shows the grouping of words from a set of 
sentences. The comparison of the processing time for n-grams at various numbers n provides different 
processing times from LPU. Figure 14 shows the comparison of the processing time for different values of n- 
grams over the number of words. It can be seen that n-grams at n = 2 provide the fastest processing time, 
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Figure 14. Processing time for N-grams at n = 1, and n = 2 over various number of words 


The success rate over the various number of words in a sentence is shown in Figure 15. The 
different number of n produces multiple success rates. N-grams at n = 2 provide a higher success rate with 
more than 80% success rate than n-grams at n = 1 (less than 80% for 10-word input sentences). The success 
rate at n = 2 decreases 9.11% when the number of words increases from 3 to 10. 
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4.3. Performance Evaluation 
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Figure 15. Success rate for N-grams at n= 1, and n = 2 over various number of words 


The implementation of n-grams and HMM provides the fastest processing time for the proposed 
LPU. Figure 16 provides the processing time average success rate for LPU using HMM and Bayesian 
network for various numbers of words. The processing time required for both techniques are varied 
depending on the number of words in the input sentence. The adaptive implementation of HMM and 
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Bayesian network is in place to achieve the fastest processing time depending on the number of words in the 
input sentence. 

The processing time increases over various words, and the success rate decreases over multiple 
words. The processing time for LPU using HMM increased by 4.5 times when there was an increase in the 
number of words from 3 to 10. The processing time for LPU using Bayesian network increased by 2.2 times 
longer when there was an increase in the number of words from 3 to 10. For the number of words between 3 
to 7 words, LPU using HMM provides an average of 69.89% processing time than NLP with Bayesian 
network. On the other hand, when LPU uses Bayesian network, the number of words is 7 to 10 words 
average 88.35% processing time than NLP with HMM. In addition, the average success rate starts at (number 
of words = 3) 93.75% and slightly decreases up to 84.49% at 10 words in a sentence. 
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Figure 16. Processing time for the LPU using HMM and Bayesian network 


5. CONCLUSION 

This paper has investigated the utilization of the n-grams, NLP, HMM, and the Bayesian network 
for the classifier. It involved measuring the processing time and success rate compared to other edit distances. 
The proposed system has adapted n-gram as the edit distance and combined it to HMM and Bayesian 
network to obtain efficient (fast) processing time and a high success rate. Evaluation and selection of 
appropriate edit distance were achieved by comparing the processing time of each selected edit distance. As a 
result, n-grams have been selected to be implemented on the LPU. Furthermore, the adaptive language 
processing Unit has implemented n-grams with NLP, HMM, and Bayesian network for the proposed 
algorithm, and it has been enabled by implementing SLC to decide either HMM or Bayesian network is used 
to adjust the output to match the SPO sequence. The proposed system's performance was reached over 
various sentence lengths. The processing time and success rate reveal that the proposed system outperforms 
the other edit distances. As a result, the proposed LPU could achieve around 1.449 ms processing time with 
an average success rate of 84.49% for a maximum of ten-word sentences. 
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